随着AI民主化的进展,机器学习(ML)已成功应用于边缘应用,如智能手机和自动驾驶。如今,更多的应用需要在具有极其有限的资源的微小设备上ML,如植入式心脏除颤器(ICD),其称为Tinym1。与边缘上的ML不同,有限的能量供应的Tinyml对低功率执行的需求较高。随机计算(SC)对数据表示的比特流是有价值的,因为它可以使用简单的逻辑门来执行基本的ML操作,而不是复杂的二进制加法器和乘法器。然而,由于算术单元的低数据精度和不准确性,SC通常遭受ML任务的低精度。增加现有作品中的比特流的长度可以减轻精度问题,但延迟较高。在这项工作中,我们提出了一种新的SC架构,即基于块的随机计算(BSC)。 BSC将输入划分为块,使得通过利用高数据并行性可以减少延迟。此外,提出了优化的算术单元和输出修订(我们)方案以提高精度。在它之上,设计了全局优化方法来确定块的数量,可以提高延迟功率折衷。实验结果表明,BSC可以优于现有的设计,以实现ML任务的高度超过10%,并且减少超过6倍。
translated by 谷歌翻译
Scene text editing (STE) aims to replace text with the desired one while preserving background and styles of the original text. However, due to the complicated background textures and various text styles, existing methods fall short in generating clear and legible edited text images. In this study, we attribute the poor editing performance to two problems: 1) Implicit decoupling structure. Previous methods of editing the whole image have to learn different translation rules of background and text regions simultaneously. 2) Domain gap. Due to the lack of edited real scene text images, the network can only be well trained on synthetic pairs and performs poorly on real-world images. To handle the above problems, we propose a novel network by MOdifying Scene Text image at strokE Level (MOSTEL). Firstly, we generate stroke guidance maps to explicitly indicate regions to be edited. Different from the implicit one by directly modifying all the pixels at image level, such explicit instructions filter out the distractions from background and guide the network to focus on editing rules of text regions. Secondly, we propose a Semi-supervised Hybrid Learning to train the network with both labeled synthetic images and unpaired real scene text images. Thus, the STE model is adapted to real-world datasets distributions. Moreover, two new datasets (Tamper-Syn2k and Tamper-Scene) are proposed to fill the blank of public evaluation datasets. Extensive experiments demonstrate that our MOSTEL outperforms previous methods both qualitatively and quantitatively. Datasets and code will be available at https://github.com/qqqyd/MOSTEL.
translated by 谷歌翻译
Responding with multi-modal content has been recognized as an essential capability for an intelligent conversational agent. In this paper, we introduce the MMDialog dataset to better facilitate multi-modal conversation. MMDialog is composed of a curated set of 1.08 million real-world dialogues with 1.53 million unique images across 4,184 topics. MMDialog has two main and unique advantages. First, it is the largest multi-modal conversation dataset by the number of dialogues by 8x. Second, it contains massive topics to generalize the open-domain. To build engaging dialogue system with this dataset, we propose and normalize two response producing tasks based on retrieval and generative scenarios. In addition, we build two baselines for above tasks with state-of-the-art techniques and report their experimental performance. We also propose a novel evaluation metric MM-Relevance to measure the multi-modal responses. Our dataset and scripts are available in https://github.com/victorsungo/MMDialog.
translated by 谷歌翻译
Video super-resolution is one of the most popular tasks on mobile devices, being widely used for an automatic improvement of low-bitrate and low-resolution video streams. While numerous solutions have been proposed for this problem, they are usually quite computationally demanding, demonstrating low FPS rates and power efficiency on mobile devices. In this Mobile AI challenge, we address this problem and propose the participants to design an end-to-end real-time video super-resolution solution for mobile NPUs optimized for low energy consumption. The participants were provided with the REDS training dataset containing video sequences for a 4X video upscaling task. The runtime and power efficiency of all models was evaluated on the powerful MediaTek Dimensity 9000 platform with a dedicated AI processing unit capable of accelerating floating-point and quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 500 FPS rate and 0.2 [Watt / 30 FPS] power consumption. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
基于拉曼扩增的物理特征,我们提出了一个基于神经网络(NN)和线性回归的三步建模方案。与基于纯NN的方法相比,通过模拟证明了更高的精度,较少的数据需求和较低的计算复杂性。
translated by 谷歌翻译
花样滑冰评分是一项艰巨的任务,因为它需要评判玩家的技术动作以及与背景音乐的协调。先前基于学习的工作无法很好地解决它,原因有两个:1)每次动作迅速变化,因此,仅应用传统的框架采样将损失很多有价值的信息,尤其是在3-5分钟的持续视频中,因此非常极端远程表示学习是必要的; 2)先前的方法很少考虑其模型中的关键视听关系。因此,我们介绍了一个多模式MLP体系结构,名为Skating-Mixer。它将基于MLP混合的框架扩展到多模式的方式,并通过我们设计的内存复发单元(MRU)有效地学习长期表示。除模型外,我们还收集了高质量的音频Visual FS1000数据集,该数据集包含1000多个视频,其中8种具有7种不同评级指标的程序类型,并在数量和多样性中都超过其他数据集。实验表明,所提出的方法优于公共FIS-V和我们的FS1000数据集的所有主要指标。此外,我们还包括一项分析,将我们的方法应用于北京2022年冬季奥运会的最新比赛,证明我们的方法具有强大的鲁棒性。
translated by 谷歌翻译
量子计算在解决特定问题方面具有优异的优势,例如整数分解和Simon的问题。对于机器学习中的更多一般任务,通过应用变分量子电路,最近已经提出了越来越多的量子算法,特别是在监督学习和无监督的学习中。但是,在加固学习中已经完成了一点工作,可以说是更重要和挑战性的。以前的Quantum加固学习的工作主要集中在行动空间是离散的离散控制任务。在这项工作中,我们开发了一种基于软演员 - 评论家的量子强化学习算法 - 用于连续控制的最先进方法之一。具体地,我们使用由变形量子电路和经典人工神经网络组成的混合量子级策略网络。在标准强化学习基准测试中测试,我们认为这种Quantum版本的软演员 - 评论家与原始的软演员 - 评论家相当,使用远不可调节的参数。此外,我们分析了不同超参数和策略网络架构的影响,指出了量子增强学习的建筑设计的重要性。
translated by 谷歌翻译
伪装的对象检测(COD)旨在检测周围环境的类似模式(例如,纹理,强度,颜色等)的对象,最近吸引了日益增长的研究兴趣。由于伪装对象通常存在非常模糊的边界,如何确定对象位置以及它们的弱边界是具有挑战性的,也是此任务的关键。受到生物视觉感知过程的启发,当人类观察者发现伪装对象时,本文提出了一种名为Errnet的新型边缘的可逆重新校准网络。我们的模型的特点是两种创新设计,即选择性边缘聚集(SEA)和可逆的重新校准单元(RRU),其旨在模拟视觉感知行为,并在潜在的伪装区域和背景之间实现有效的边缘和交叉比较。更重要的是,RRU与现有COD模型相比,具有更全面的信息。实验结果表明,errnet优于三个COD数据集和五个医学图像分割数据集的现有尖端基线。特别是,与现有的Top-1模型SINET相比,ERRNET显着提高了$ \ SIM 6%(平均电子测量)的性能,以显着高速(79.3 FPS),显示ERRNET可能是一般和强大的解决方案COD任务。
translated by 谷歌翻译
灵巧的操纵仍然是机器人技术中的一个空缺问题。为了协调研究界为解决这个问题的努力,我们提出了共同的基准。我们设计和构建了机器人平台,该平台托管在MPI上供智能系统托管,可以远程访问。每个平台由三个能够敏捷物体操纵的机器人手指组成。用户能够通过提交自动执行的代码(类似于计算群集)来远程控制平台。使用此设置,i)我们举办机器人竞赛,来自世界任何地方的团队访问我们的平台以应对具有挑战性的任务ii)我们发布了在这些比赛中收集的数据集(包括数百个机器人小时),而我们为研究人员提供了访问自己项目的这些平台。
translated by 谷歌翻译